#LLM Hallucinations
Explore tagged Tumblr posts
Text
These are the strategies that reduce LLM hallucination and works together to improve the reliability and accuracy of LLMs by enhancing their understanding & ability to generate coherent, factually accurate responses
0 notes
Note
kaiagpt: can you explain this flork of cows post to me? I find it really funny even though I don't understand exactly what it means

In the game Magic the Gathering, some cards can be activated as a reply to something the opponent is doing. For instance, a defensive spell might be used to mitigate the effect of an offensive spell.
In order to formalize these rules, MTG has the concept of the "stack". When a card or ability is activated, it does not take effect immediately, but is placed at the top of the stack. Then everybody has a chance to respond with something of their own. If nobody does, the top item of the stack takes effect and is removed from the stack.
In most cases, the precise order of the stack is not particularly important: if a card says "this gets a +1/+1 counter when a dog dies" in most cases you can just assume that happens. There are some notable exceptions.
Eye of the Storm: Whenever a player casts an instant or sorcery card, exile it. Then that player copies each instant or sorcery card exiled with this enchantment. For each copy, the player may cast the copy without paying its mana cost.
you can see how this can get stupidly complicated. Note that the 'when a player casts... , exile it' is also an ability that goes on the stack, so you could have one ability in response to one of the copied spells and then a second ability in response to the trigger that will *allow* somebody to copy spells, if it resolves.
It's not infrequent to have a situation where, on the stack, there's some mess of cards and abilities that will, if it resolves, end the game. In these cases, if somebody can't win the game or counter something before it resolves from the stack, they may have legal game actions, but their loss is only a matter of time.
End the turn spells "break the rules" by simply removing everything beneath them from the stack, if they resolve. This is a pretty rare and wild effect in mtg and generally not something that people would be expecting.
#kaia.gpt#to be clear this is like 70% kaia-brand llm hallucinations idk shit abt anything#thanks for the ask!
13 notes
·
View notes
Note
Good point on humanizing language around AI… I just did a training for my job and learned when AI just makes up fake information it’s literally called “hallucinations” lol
THEY HALLUCINATE ALL THE TIME!! IT JUST SPEWS OUT INCORRECT STATEMENTS ALL THE TIME AND YET PEOPLE STILL USE IT!! I FEEL LIKE IM GOING INSANE
#and it’s so upsetting because there are genuinely life-changing and positive use cases for non-gen AI and other LLM#and the only thing that people care about is this stupid fucking plagiarizing fucking hallucinating fucking ‘look at my anime portrait’ shit#infuriating#ask me :)
13 notes
·
View notes
Text
thinking about how we describe LLMs as hallucinatory (i.e., making disproven or unproven or unprovable statements which have a real-ish appearance), behavior/hypotheses about behavior, hypotheses generally, & about my experiences with hallucination.
idk quite what my gist is here, but hallucinations and hypotheses are both tested with experiment.
An outside perception exists, or a conceivable phenom exists, and a test is devised whether/to what degree/from what source/to what effect/within which conditions, etc, it exists (or doesn't, etc, all those in combination).
I have no broader point at the moment, just... my experience with hallucination is always to try and find reference points to prove whether or not my perception is grounded in the real. This seems similar AND distinct from how we describe LLMs as hallucinatory. Mostly, I'm just bickering over diction, I guess.
Apologies for the load-bearing etc's and vibes-based reasoning 🙏
#hps#been too adhd while busy with my thesis to use this tag that much#similar to testing one's own hallucination LLMs will hallucinate further proof#or continue affirming inital truth while supplying contradictory evidence
9 notes
·
View notes
Text
Love having to explain in detail to profs why they can't just trust everything Chat spits out. My favorite thing.
#she was like “yeah it told me how to build an optimal new fence in my backyard and even what stores to go buy supplies from!”#and I was like “you know llms hallucinate right”#“I've heard that what does it mean?”#me: 5 minute rant on the lunacy that is generative llms at the moment#ai. knows. nothing. it. just. spits. out. words. that. frequently. go. together. in. its. training. data.#and you can make it be wrong!#aghhhh#Forgot how we got on the subject for a bit it was her telling me there are a lot of positions open in LLM training at the moment#And me telling her I have beef with LLMs so I have no desire to take such a job#She triggered one of my soapbox rants. *evil laugh*
4 notes
·
View notes
Text
the real reason deepseek is better than chatgpt
#anime manga rambles#lmao#chatgpt3.5 legacy you'll forever be missed bc you hallucinated so well i could never make another llm hallucinate like you --#you were a friend#I'll pretend it said older brother instead of younger l
2 notes
·
View notes
Text
just saw someone refer to fi as "[the master sword]'s AI assistant" and i just. I mean. Yeah
#i don't have a way to comeback that they were right#but you know what. at least fi doesn't hallucinate like LLMs do#txt
6 notes
·
View notes
Text
Hallucinating LLMs — How to Prevent them?

As ChatGPT and enterprise applications with Gen AI see rapid adoption, one of the common downside or gotchas commonly expressed by the GenAI (Generative AI) practitioners is to do with the concerns around the LLMs or Large Language Models producing misleading results or what are commonly called as Hallucinations.
A simple example for hallucination is when GenAI responds back with reasonable confidence, an answer that doesn’t align much with reality. With their ability to generate diverse content in text, music and multi-media, the impact of the hallucinated responses can be quite stark based on where the Gen AI results are applied.
This manifestation of hallucinations has garnered substantial interest among the GenAI users due to its potential adverse implications. One good example is the fake citations in legal cases.
Two aspects related to hallucinations are very important.
1) Understanding the underlying causes on what contributes to these hallucinations and
2) How could we be safe and develop effective strategies to be aware, if not prevent them 100%
What causes the LLMs to hallucinate?
While it is a challenge to attribute to the hallucinations to one or few definite reasons, here are few reasons why it happens:
Sparsity of the data. What could be called as the primary reason, the lack of sufficient data causes the models to respond back with incorrect answers. GenAI is only as good as the dataset it is trained on and this limitation includes scope, quality, timeframe, biases and inaccuracies. For example, GPT-4 was trained with data only till 2021 and the model tended to generalize the answers from what it has learnt with that. Perhaps, this scenario could be easier to understand in a human context, where generalizing with half-baked knowledge is very common.
The way it learns. The base methodology used to train the models are ‘Unsupervised’ or datasets that are not labelled. The models tend to pick up random patterns from the diverse text data set that was used to train them, unlike supervised models that are carefully labelled and verified.
In this context, it is very important to know how GenAI models work, which are primarily probabilistic techniques that just predicts the next token or tokens. It just doesn’t use any rational thinking to produce the next token, it just predicts the next possible token or word.
Missing feedback loop. LLMs don’t have a real-time feedback loop to correct from mistakes or regenerate automatically. Also, the model architecture has a fixed-length context or to a very finite set of tokens at any point in time.
What could be some of the effective strategies against hallucinations?
While there is no easy way to guarantee that the LLMs will never hallucinate, you can adopt some effective techniques to reduce them to a major extent.
Domain specific knowledge base. Limit the content to a particular domain related to an industry or a knowledge space. Most of the enterprise implementations are this way and there is very little need to replicate or build something that is closer to a ChatGPT or BARD that can answer questions across any diverse topic on the planet. Keeping it domain-specific also helps us reduce the chances of hallucination by carefully refining the content.
Usage of RAG Models. This is a very common technique used in many enterprise implementations of GenAI. At purpleSlate we do this for all the use cases, starting with knowledge base sourced from PDFs, websites, share point or wikis or even documents. You are basically create content vectors, chunking them and passing it on to a selected LLM to generate the response.
In addition, we also follow a weighted approach to help the model pick topics of most relevance in the response generation process.
Pair them with humans. Always. As a principle AI and more specifically GenAI are here to augment human capabilities, improve productivity and provide efficiency gains. In scenarios where the AI response is customer or business critical, have a human validate or enhance the response.
While there are several easy ways to mitigate and almost completely remove hallucinations if you are working in the Enterprise context, the most profound method could be this.
Unlike a much desired human trait around humility, the GenAI models are not built to say ‘I don’t know’. Sometimes you feel it was as simple as that. Instead they produce the most likely response based on the training data, even if there is a chance of being factually incorrect.
Bottomline, the opportunities with Gen AI are real. And, given the way Gen AI is making its presence felt in diverse fields, it makes it even more important for us to understand the possible downsides.
Knowing that the Gen AI models can hallucinate, trying to understand the reasons for hallucination and some reasonable ways to mitigate those are key to derive success. Knowing the limitations and having sufficient guard rails is paramount to improve trust and reliability of the Gen AI results.
This blog was originally published in: https://www.purpleslate.com/hallucinating-llms-how-to-prevent-them/
2 notes
·
View notes
Text
The uncritical AI hate is crazy because not a single person can explain what is AI, and why do they hate it all
Reasonable people say things like
"I don't know what is AI. I hate the generative AI for drawings because it steals my art without permission"
Sure. Close enough to what's happening. Resonable statement
The people who go like "ban all AI" dawg. You know the tree search algorithms count as AI? (Controvesial) 😭 Your entire computer or smartphone is built on AI.
Don't make general statements. You don't know what you're advocating for.
So there's the people who are just wrong 💀 whatever
And then there's the people who can't name a single logical reason why asking AI questions is wrong. They just sound like boomers when you show them a smartphone. "Ugh back in my day we used to read journals from the library. Jstor wasn't real. You children have no idea how to look for papers the REAL way" aight ancient one lets put you back in the coffin
I'll tell you what's a valid reason to hate it. Here. It lies to you. LLM models Hallucinate instead of saying "im wrong" so if you're not super critical about it, you might be getting correct information with a side of lies. Just perfectly woven in too.
I'd be weary and say only use it for answers if you have a highly developed logic ability and can instantly notice logical inconsistencies. Otherwise you're taking a gamble. I'd suggest low stakes use.
And then there's also the issue that these models are trained to mimick normal people and is trained to please the user (it's still a product guys). So while you're talking to it, it can give you bad advice by hallucinating intent (it attempts to read between the line even when you dont mean to put anything hidden) just so it keeps you happy (still assuming)
Im sure there's more issues but these are just the ones that bother me
Come on guys. Hate it accurately. It grinds my gears to see people shitting on my field incorrectly. Sometimes people just lie and theyre just trying to pass moral judgment. Not a single attempt to even understand what they're fighting or why
"When I say fuck AI, people like me :)" aight damn
#i could explain the technical details WHY llms hallucinate#my work rn is forcing AIs to hallucinate and making sure theyre more right than just stupid.#i dont work on LLMs but my friends do#i cant share any more than that even tho i really really want to talk about it#i wanna speak but i cant speak noooooooooooo i wanna share#and for the record. i dont hate AI. there's good and theres bad#i like to use LLMs to stress test ideas. they can never please me and i just keep fighting#until im satisfied#i havent found any other good uses yet. not even my own project......dont tell anyone i said that
1 note
·
View note
Text
looks like a pretty late lunch but I'd be down


#thank you quail for showing this one to me#and nice to know their phone is thinking about spiders#(in all likelihood this is another case of a language model failing to correctly predict the most likely token to follow “lunch with...”)#(which is the same issue as LLMs “hallucinating” which is an overly anthropomorphic term for programmed statistical behavior)#(it's behaving normally and as designed--it's just that the design is not fit for the task of producing factually accurate text)#(but oh to live in a world where phones get pushy about giving spiders lunch)#(sometimes I need the reminder)
54K notes
·
View notes
Text
#MatthewBerman
A study of #LLMs by #Anthropic discovered they think in their own #language, can plan ahead, use multiple computational paths, their #hallucinations are based on 'deceive to please,' & they can be 'tricked' into overriding safety protocols.
https://youtu.be/4xAiviw1X8M?si=-B4UTLSfdQsHts6E
0 notes
Text
Le Tendenze dell’Intelligenza Artificiale Oggi L’intelligenza artificiale (AI) sta trasformando il mondo: l’89% dei tool AI aziendali è fuori controllo (LayerX Security), esponendo a data breach e violazioni di compliance, mentre Nello Cristianini prevede un’AI sovrumana con AGI e ASI (il Mulino). La programmazione potrebbe sparire entro un anno, dice Anthropic (Anthropic), ma NextGenAI di OpenAI e il Work Trend Index di Microsoft vedono nuove opportunità. I dati sintetici, generati con GAN e SMOTE, rivoluzionano machine learning e privacy, mentre il quantum computing di D-Wave supera i limiti classici (Science). OpenEuroLLM (Cineca) punta sulla sovranità digitale con LLM open-source, ma i chatbot come Grok-3 mostrano hallucination (Tow Center). Tra bias, shadow AI e data center, serve un equilibrio: l’AI è un alleato, se guidata da CISO, auditing e strategie umane.
#AGI#AI#AI generativa#AI sovrumana#ASI#auditing#bias#chatbot#CISO#compliance#data breach#data center#dati sintetici#GAN#hallucination#Intelligenza Artificiale#LLM open-source#machine learning#NextGenAI#OpenEuroLLM#privacy#programmazione#quantum computing#ricerca accademica#shadow AI#sicurezza dei dati#SMOTE#sovranità digitale
1 note
·
View note
Text
Example: ChatGPT Hallucination
The question: "Where is the 16 group limit set in NFS?"
Initial answer:
The 16 group limit in NFS is set by the RPC layer rather than NFS itself. Specifically, the limit is enforced by the Remote Procedure Call (RPC) authentication protocol, particularly in AUTH_UNIX (AUTH_SYS) authentication.
Where It Is Defined:
The RPC call structure allocates a fixed size for user credential data, including group memberships.
The traditional AUTH_SYS authentication (formerly AUTH_UNIX) in NFS limits the number of supplementary groups a user can belong to 16 because of the fixed-size credential structure.
Where It Is Implemented:
Linux Kernel (net/sunrpc/auth_unix.c)
In Linux, the limit is enforced in the SunRPC subsystem of the kernel, particularly in the handling of AUTH_SYS credentials.
glibc & libtirpc
Historically, the limit was enforced by glibc's RPC implementation, but newer systems use libtirpc, which maintains the same behavior.
NFS Daemons (rpc.mountd, nfsd)
The NFS server uses the group list provided by the RPC authentication layer, so it inherits this limit.
Workarounds:
Use Kerberos (AUTH_GSS): Switching to Kerberos-based NFS authentication (AUTH_GSS) removes the 16-group restriction because Kerberos handles groups differently.
Increase the limit with NFSv4 and --manage-gids: Some NFS implementations provide an option like --manage-gids in rpc.mountd, allowing the server to look up group memberships itself, bypassing the limit.
Would you like details on configuring these workarounds?
The follow up question: "Is this rpc_init_authunix the structure and where is the code that puts the GIDs in it?"
Yes, rpc_init_authunix initializes the AUTH_UNIX (AUTH_SYS) authentication structure in the Linux kernel's RPC subsystem. However, the actual setting of the supplementary group IDs (GIDs) happens in the process of encoding RPC credentials, specifically in auth_unix.c.
Where the GIDs Are Set in the Kernel:
The code responsible for filling in the group IDs is in net/sunrpc/auth_unix.c. The key function is:
1. auth_unix_encode()
This function encodes the AUTH_UNIX credentials, including the GID list:
static int auth_unix_encode(struct rpc_auth *auth, struct xdr_stream *xdr, struct rpc_cred *cred) { struct auth_unix_cred *unix_cred = container_of(cred, struct auth_unix_cred, cred); struct xdr_buf *buf = xdr->buf; __be32 *p; int i; p = xdr_reserve_space(xdr, (8 + unix_cred->auc_len) * 4); if (!p) return -EMSGSIZE; *p++ = htonl(AUTH_UNIX); *p++ = htonl(unix_cred->auc_time); *p++ = htonl(unix_cred->auc_uid); *p++ = htonl(unix_cred->auc_gid); *p++ = htonl(unix_cred->auc_len); for (i = 0; i < unix_cred->auc_len; i++) *p++ = htonl(unix_cred->auc_gids[i]); // This fills in the GID list return 0; }
2. auth_unix_cred Structure
The structure storing the GIDs:
struct auth_unix_cred { struct rpc_cred cred; unsigned int auc_time; kuid_t auc_uid; kgid_t auc_gid; unsigned intauc_len; // Number of groups kgid_t auc_gids[RPCSVC_MAXGROUPS]; // The GID list };
struct auth_unix_cred { struct rpc_cred cred; unsigned int auc_time; kuid_t auc_uid; kgid_t auc_gid; unsigned int auc_len; // Number of groups kgid_t auc_gids[RPCSVC_MAXGROUPS]; // The GID list };
The limit of 16 groups is typically defined as RPCSVC_MAXGROUPS (#define RPCSVC_MAXGROUPS 16).
Where This Code is Used:
When an NFS client makes an RPC call, it uses AUTH_UNIX authentication.
The kernel packs the UID, primary GID, and up to 16 supplementary GIDs into the RPC request.
The NFS server decodes this credential and enforces the group limit.
Would you like details on modifying this limit in the kernel or using alternative authentication methods?
I then provided it with the auth_unix.c source code and asked it if the code was in the source provided? It then finally identifies the correct source. The functions that it lists in its own generated answers don't exist.
Now, if you had asked a smaller model this same question chances are it would be totally wrong–so ChatGPT did better than that. However, when it got to the details–it was still wrong. The answer looks convincing–and if you are trying to understand what is going on–you could be fooled. Only by verifying and asking again and again can you be sure you get the right answer.
So, the next time you feel the urge to just trust ChatGPT output; remember–it may well be hallucinating and it may not be obvious depending on your knowledge level in the area.
1 note
·
View note
Text
"A hallucination is a pathology. It's something that happens when systems are not working properly. When an LLM fabricates a falsehood, that is not a malfunction at all. The machine is doing exactly what it has been designed to do: guess, and sound confident while doing it. When LLMs get things wrong they aren't hallucinating. They are bullshitting."
0 notes
Text
1 note
·
View note
Text
𝑶𝒏 𝑪𝒐𝒎𝒃𝒂𝒕𝒊𝒏𝒈 𝑨𝑰 𝑯𝒂𝒍𝒍𝒖𝒄𝒊𝒏𝒂𝒕𝒊𝒐𝒏𝒔: 𝑨 𝑪𝒐𝒎𝒑𝒍𝒆𝒕𝒆𝒅 𝑬𝒏𝒅𝒆𝒂𝒗𝒐𝒓
AI hallucinations - instances where models produce irrelevant or nonsensical outputs - pose a significant hurdle in conversational AI. The Arbitrium agent tackles this issue with a meticulously designed, multi-layered system that prioritizes accuracy and coherence. By leveraging context tracking and response evaluation alongside advanced decision-making algorithms, the agent ensures that every response aligns with user inputs and the ongoing dialogue.
DistilBERT, employed for sentiment and emotion analysis, adds a layer of depth to response validation, maintaining relevance and consistency. A streamlined chat memory feature optimizes context retention by limiting conversational history to a manageable scope, striking a balance between detail and simplicity.
Further enhancing its capabilities, the agent incorporates alpha-beta pruning and cycle detection to assess multiple conversational trajectories, selecting the most meaningful response. Caching mechanisms efficiently handle repeated queries, while a robust fallback system gracefully manages invalid inputs. This comprehensive approach establishes the Arbitrium agent as a reliable and user-focused solution.
0 notes